NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PIDformer: Transformer Meets Control Theory

Nguyen, Tam; Uribe, César A; Nguyen, Tan M; Baraniuk, R G (July 2024, International Conference on Machine Learning (ICML))

Full Text Available
PIDformer: transformer meets control theory

Nguyen, Tam; Uribe, Cesar A; Nguyen, Tan M; Baraniuk, Richard G (July 2024, JMLR.org)

Full Text Available
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Nguyen, Tam; Nguyen, Tan M; Baraniuk, Richard (December 2023, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
Improving Transformers with Probabilistic Attention Keys

Tam Nguyen, Tan M. (July 2022, Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA)

Full Text Available
MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Nguyen, Tan M.; Baraniuk, Richard G.; Bertozzi, Andrea L.; Osher, Stanley L.; Wang, Bao (December 2020, Advances in neural information processing systems)
null (Ed.)
Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.
more » « less
Full Text Available
MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Nguyen, Tan M.; Bertozzi, Andrea L; Osher, Stanley J; Wang, Bao (January 2020, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada)

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.
more » « less
Full Text Available

Search for: All records